Importing the necessary libraries:

Loading the imported data into an array:

Preprocessing the image data and saving the input and output masks in separate arrays: i)All the input images are resized and reshaped to a uniform shape of (224,224,3) regardless of whether they are 4-channel,3-channel or 1-channel images. ii)The bounding box coordinates are scaled up as per the required image height and widths and the binary masks are created using these coordinates where mask pixels are set to 1 while the rest of the pixels are set to 0.

Displaying a few samples of the reshaped images and their respective masks:

Splitting the input into train and test data respectively for later use:

Creating the U-Net Architecture using Mobilenet as the pretrained transfer learning model: i)Here the top layers of the transfer learning model are non trainable while the bottom layers are trainable. ii) The downsampled activation layers following convolution are upsampled and concatenated in order to obtain the u net architecture and obtain the original features that are lost while using maxpooling and other downsampling methods.

Defining the dice coefficient and loss for the network: The dice coefficient follows the standard definition for dice coefficient(2*intersection/union) while the loss combines the binary crossentropy and the dice coefficient loss.

Compiling the model using adam optimizer and saving the best model using checkpoint and early stopping. The reduce lr on plateau is used to reduce the learning rate when there in no improvement in the loss.

Fitting the data using the model and training it:

Varying the previously defined unet architecture through addition of convolution layers following the upsampling of the various layers in order to capture essential features of the upsampled layers:

Fitting the data using the above defined network and training it:

From the above data it can be seen that the dice coefficient improves to 0.90 from 0.8 when trained using the same number of 20 epochs.

Loading the test image:

Resizing and reshaping the test image as per the model input requirements:

Predicting the mask label for the test image:

Reshaping and displaying the predicted mask:

Creating binary mask by setting predicted pixel values greater than 0.9 to 1 and the rest to 0:

Displaying the binary mask:

Using bitwise and operator of OpenCV to display the faces corresponding to the masks that have been predicted:

Fitting the model using validation data to evaluate the model while training:

As seen in the training above, the training dice coefficient increases more than 0.9 while the validation dice coefficient does not show much improvement while training and continues to remain in the range of 0.6 to 0.7. This could be due to lack of sufficient training data and validation data.

Evaluating the above trained model: A dice coefficient of 0.74 is obtained.

Predicting and displaying the mask for a sample test image:

Using data augmentation for in order to generate more training samples for the network:

Creating variations in the input using width shift,height shift and zooming:

Fitting the augmented data using the model and training on these samples:

From the above training, a slight improvement in the validation dice coefficient can be seen while training the network as compared to before.

Using CNN instead of pretrained transfer learning architecture:

Fitting the image generator data using the model:

The training has been interrupted as seen above due to poor performance and no improvement in the training dice coefficient.

Training on the given dataset using efficient net:

Using Snapshot ensembling technique and cosine anealing for obtaining cyclic learning rates in order to perform stochastic weight averaging:

Using EfficientB4 pretrained transfer learning model using image net weights:

Generating the summary of the model:

Adding the convolution and residual blocks in order to perform downsampling and upsampling:

Defining stochastic weight averaging class:

Fitting the model using the snapshot ensemble tecnique for training in order to implement stochastic weight averaging of last 3 epochs:

Using ResNet 50 pretrained transfer learning model as the backbone for the UNET architecture:

Fitting the train generator data using the model and observing the performance:

As seen above, the validation dice coefficient is very less while the train dice coefficient is high.

Using ResNet34 architecture as the backbone:

Fitting the train generator data using the defined model:

Part 2: i) Importing the data:

Unzipping the data:

Creating the metadata:

Defining data generator for positive,negative and anchor images:

Defining the preprocessing class for preprocessing the images:

Aligning faces of the images in the dataset using AlignDib class:

Using inception network architecture:

Creating the backbone of the siamese network to get the embedding vectors and using these vectors for the siamese network to be trained:

Defining the siamese network input and triplet loss for the positive,anchor and negative images:

Training the network from scratch using the data generator defined above:

The loss remains around 0.8 as seen above.

Storing all the embedding vectors in a list:

Using pretrained weights for the above model:

Saving all embedding vectors using the pretraind model predictions in one list:

Defining the VGG architecture:

Loading the pretrained weights in the model:

Combining the model input and output of the 2nd last layer that gives the embedding vector:

Defining the function for preprocessing the images:

Defining the cosine similarity and euclidian distance functions for measuring the distance between 2 images and thus inferring how similar the persons in the image are:

Generating the distances between sample images and verfying if the persons in the images are same or not using epsilon as the threshold value given below:

As seen above, the cosine similarity is 0.36 which is less than the threshold indicating they are the same person. The euclidian distance is 0.87

Defining the siamese network using the embedding vectors obained from the pretrained VGG model:

Compiling the network using Adam optimizer:

As seen below the loss remains around 0 which is better than that trained using Inception network above.

Generating embedded vectors using preprocess_input1 function for preprocessing input images:

Generating embedding vectors using preprocess_image as the function for preprocessing input images for generating embedding vectors:

Defining the euclidian distance function for generating distances for image pairs:

Generating distance for sample images using the above embedding vectors:

Saving the distances for each pair of images in the training set in order the set the threshold for the euclidian distance. This threshold is the point where the F1 score is the maximum:

The training has been interrupted above due to large number of training samples.

Using the already obtained distances before interruption for obtaining the threshold:

The threshold is found to be 0.554.

Using SVM Classifier for prediction of faces:

Saving the target labels in a separate array:

Encoding the target labels:

Saving the train and test indices in separate lists to separate data and train and test inputs: Here data with odd indices are saved in the train list which even indices are saved in the test list.

Comparing the performance of the SVM without PCA on the embedding vectors obtained from the VGG network and the inception networks respectively:

Saving the encoded labels in output train and test arrays:

Defining the SVM Classifier using C=10:

Fitting SVM on the data obtained from VGG predicted embedding vectors:

Generating the accuracy score from predictions made on the test data:

Fitting the SVM classifier on the embedding vectors obtained from the inception network:

Obtaining the accuracy score from predictions made on the test data:

As seen above, the SVM performs better on the embedding vectors obtained from the VGG network.

Using Logistic Regression as the Classifier and fitting on the data obtained from the inception network:

Obtaining the accuracy score using test data predictions:

Fitting the logistic classifier on the vectors obtained from the VGG network:

Obtaining the accuracy score:

As seen above, the logistic regression trained on the vectors obtained from VGG performs better than that trained on the vectors obtained from inception network. It also performs slightly better as compared to the SVM Classifier.

Generating a sample prediction:

As seen above the image has been identified and predicted correctly.

ii) Using PCA:

Obtaining the cumulative variance plot to infer the reduced number of components that can be used which provide more tha 95% variance:

As seen above, 95% variance is reached using 60 components following which very minor changes can be observed with increase in components.

The PCA below is fit using 100 components:

Splitting the PCA data using test size of 0.5:

Defining the SVM classifier using C=10 and fitting the classifier using train input and outputs:

Obtaining the accuracy from predictions on the test data:

The accuracy obtained above is around 90.54%

Displaying the test images to be predicted:

Preprocessing the images as per the requirements for VGG model to generate the corresponding embedding vectors:

Generating the embedding vectors for both images:

Predicting the labels for the embedding vectors obtained above for the images:

Obtaining the labels through inverse transform of encoded values:

The above labels have been predicted correctly as seen above.

Implementing PCA transformation for the generated vectors:

Predicting the labels using the PCA transformed vectors using the SVM Classifier trained on the PCA transformed data:

Carrying out hyperparameter tuning to find the optimal parameters:

Obtaining the best parameters: The best parameters are C-10,gamma-0.001 and kernel='rbf'

Building the SVM Classifier using the optimal parameters:

Obtainin the accuracy score from prediction using test data:

Conclusions: Part 1: The Mobilenet architecture along with data augmentation was found to perform the best as compared to other networks. Using upsampling along with convolution was found to yield better results than just using upsampling alone. The data samples were less in number as a result of which validation dice coefficient was low and overfitting could aso be observed. Part 2: The SVM trained using VGG network embedding vectors was found to be better than that obtained using inception network.